Hungarian Word-Sense Disambiguated Corpus

نویسندگان

Veronika Vincze

György Szarvas

Attila Almási

Dóra Szauter

Róbert Ormándi

Richárd Farkas

Csaba Hatvani

János Csirik

چکیده

To create the first Hungarian WSD corpus, 39 suitable word form samples were selected for the purpose of word sense disambiguation. Among others, selection criteria required the given word form to be frequent in Hungarian language usage (frequency rates available in the Hungarian National Corpus (HNC) were used for measurement (Váradi, 2000)), and to have more than one sense considered frequent in usage. HNC and its Heti Világgazdaság (HVG) subcorpus provided the basis for corpus text selection. This way, each sample has a relevant context (the whole HVG article), and information on the lemma, POS-tagging and automatic tokenization is also

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manually Annotated Hungarian Corpus

Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (MorphoSyntactic Description) has been used. The corpus contains texts of five different topic areas...

متن کامل

An Iterative Approach to Word Sense Disambiguation

In this paper, we present an iterative algorithm for Word Sense Disambiguation. It combines two sources of information: Word_Net and a semantic tagged corpus, for the purpose of identifying the correct sense of the words in a given text. It differs from other standard approaches in that the disambiguation process is performed in an iterative manner: starting from free text, a set of disambiguat...

متن کامل

Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation

In this paper we present an experiment to automatically generate annotated training corpora for a supervised word sense disambiguation module operating in an English-Hungarian and a Hungarian-English machine translation system. Training examples for the WSD module are produced by annotating ambiguous lexical items in the source language (words having several possible translations) with their pr...

متن کامل

Adaptive Word Sense Disambiguation Using Lexical Knowledge in Machine-readable Dictionary

This paper describes a general framework for adaptive conceptual word sense disambiguation. The proposed system begins with knowledge acquisition from machine-readable dictionaries. Central to the approach is the adaptive step that enriches the initial knowledge base with knowledge gleaned from the partial disambiguated text. Once the knowledge base is adjusted to suit the text at hand, it is a...

متن کامل

A Naïve Bayes Approach for Word Sense Disambiguation

The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve many NLP problems.The Naive Bayes algorithm which is one o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Hungarian Word-Sense Disambiguated Corpus

نویسندگان

چکیده

منابع مشابه

Manually Annotated Hungarian Corpus

An Iterative Approach to Word Sense Disambiguation

Exploiting Parallel Corpora for Supervised Word-Sense Disambiguation in English-Hungarian Machine Translation

Adaptive Word Sense Disambiguation Using Lexical Knowledge in Machine-readable Dictionary

A Naïve Bayes Approach for Word Sense Disambiguation

عنوان ژورنال:

اشتراک گذاری